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Chapter 1 

Introduction to Concise Signal Models 1 



1.1 Overview 

In characterizing a given problem in signal processing, one is often able to specify a model for the signals 
to be processed. This model may distinguish (either statistically or deterministically) classes of interesting 
signals from uninteresting ones, typical signals from anomalies, information from noise, etc. 

Very commonly, models in signal processing deal with some notion of structure, constraint, or conciseness. 
Roughly speaking, one often believes that a signal has "few degrees of freedom" relative to the size of the 
signal. This notion of conciseness is a very powerful assumption, and it suggests the potential for dramatic 
gains via algorithms that capture and exploit the true underlying structure of the signal. 

In these modules, we survey three common examples of concise models: linear models, sparse nonlinear 
models, and manifold-based models. In each case, we discuss an important phenomenon: the conciseness 
of the model corresponds to a low-dimensional geometric structure along which the signals of interest tend 
to cluster. This low-dimensional geometry again has important implications in the understanding and the 
development of efficient algorithms for signal processing. 

We discuss this low-dimensional geometry in several contexts, including projecting a signal onto the 
model class (i.e., forming a concise approximation to a signal), encoding such an approximation (i.e., data 
compression), and reducing the dimensionality of signals and data sets. We conclude with an important and 
emerging application area known as Compressed Sensing (CS), which is a novel method for data acquisition 
that relies on concise models and builds upon strong geometric principles. We discuss CS in its traditional, 
sparsity-based context and also discuss extensions of CS to other concise models such as manifolds. 

1.2 General Mathematical Preliminaries 

1.2.1 Signal notation 

We will treat signals as real- or complex- valued functions having domains that are either discrete (and finite) 
or continuous (and either compact or infinite). Each of these assumptions will be made clear as needed. As 
a general rule, however, we will use x to denote a discrete signal in R^ and / to denote a function over a 
continuous domain V. We also commonly refer to these as discrete- or continuous-time signals, though the 
domain need not actually be temporal in nature. 



1 This content is available online at <http://cnx.Org/content/ml8720/l.5/>. 



2 CHAPTER 1. INTRODUCTION TO CONCISE SIGNAL MODELS 

1.2.2 Lp and lp norms 

As measures for signal energy, fidelity, or sparsity, we will employ the L p and £ p norms. For continuous-time 
functions, the L p norm is defined as 

\\fh A v) = (fv\f\ P ) 1/P > pe(o,oo), (1.1) 

and for discrete-time functions, the £ p norm is defined as 

££il*(»)r) 1/P , pe(o,oo), 

|| x L = { max \x(i)\, p=oo, (1.2) 

F i—l,--- ,N 

where 1 denotes the indicator function. (While we often refer to these measures as "norms," they actually 
do not meet the technical criteria for norms when p < 1.) 

1.2.3 Linear algebra 

Let A be a real- valued M x N matrix. We denote the nullspace of A as Af (A) (note that Af (A) is a linear 
subspace of R N ), and we denote the transpose of A as A T . 

We call A an orthoprojector from R w to M M if it has orthonormal rows. From such a matrix we 
call A A the corresponding orthogonal projection operator onto the M-dimensional subspace of M. N 
spanned by the rows of A. 



Chapter 2 

Signal Dictionaries and Representations 1 



For a wide variety of signal processing applications (including analysis, compression, noise removal, and so 
on) it is useful to consider the representation of a signal in terms of some dictionary [80]. In general, a 
dictionary ^ is simply a collection of elements drawn from the signal space whose linear combinations can 
be used to represent or approximate signals. 

Considering, for example, signals in R N , we may collect and represent the elements of the dictionary ^> 
as an N x Z matrix, which we also denote as ^. From this dictionary, a signal x € R w can be constructed 
as a linear combination of the elements (columns) of ^>. We write 

x = *a (2.1) 

for some a £ M. z . (For much of our notation in this section, we concentrate on signals in R w , though the 
basic concepts translate to other vector spaces.) 

Dictionaries appear in a variety of settings. The most common may be the basis, in which case ^ 
has exactly N linearly independent columns, and each signal x has a unique set of expansion coefficients 
a = ^~ l x. The orthonormal basis (where the columns are normalized and orthogonal) is also of particular 
interest, as the unique set of expansion coefficients a = \|/ _1 a; = ^ x can be obtained as the inner products 
of x against the columns of <£. That is, a (i) =< x,ipi >, i = 1, 2, • • • , N, which gives us the expansion 



N 
X = 

i=l 



Y J <x,A>i> l - (2-2) 



We also have that ||a;||2 = J2i=i < x ii ; i > 2 - 

Frames are another special type of dictionary [75]. A dictionary ^ is a frame if there exist numbers A 
and B, < A < B < oo such that, for any signal x 

A\\x\\ 2 2 <Y,<x^z> 2 <B\\x\\ 2 2 . (2.3) 

z 

The elements of a frame may be linearly dependent in general (see Figure 2.1), and so there may exist 
many ways to express a particular signal among the dictionary elements. However, frames do have a useful 
analysis/synthesis duality: for any frame ^ there exists a dual frame ^ such that 

x = Y^ < x, i> z > 4> z = ]T < x, 4> z > ip z . (2.4) 

z z 

In the case where the frame vectors are represented as columns of the N x Z matrix <£, the matrix ^ 
containing the dual frame elements is simply the transpose of the pseudoinverse of <£. A frame is called 



1 This content is available online at <http://cnx.Org/content/ml8724/l.5/>. 



4 CHAPTER 2. SIGNAL DICTIONARIES AND REPRESENTATIONS 

tight if the frame bounds A and B are equal. Tight frames have the special properties of (i) being their 
own dual frames (after a rescaling by 1/A) and (ii) preserving norms, i.e., 2~2i=i < x,ipi > 2 = A||x||2. The 
remainder of this section discusses several important dictionaries. 



x(2)> 




Figure 2.1: A simple, redundant frame $ containing three vectors that span ! 



2.1 The canonical basis 



The standard basis for representing a signal is the canonical (or "spike") basis. In K , this corresponds to a 
dictionary *$> = In (the N x N identity matrix). When expressed in the canonical basis, signals are often 
said to be in the "time domain." 



2.2 Fourier dictionaries 



The frequency domain provides one alternative representation to the time domain. The Fourier series and 
discrete Fourier transform are obtained by letting \P contain complex exponentials and allowing the expansion 
coefficients a to be complex as well. (Such a dictionary can be used to represent real or complex signals.) A 
related "harmonic" transform to express signals in R w is the discrete cosine transform (DCT), in which Vl> 
contains real- valued, approximately sinusoidal functions and the coefficients a are real-valued as well. 



2.3 Wavelets 

Closely related to the Fourier transform, wavelets provide a framework for localized harmonic analysis 
of a signal [80]. Elements of the discrete wavelet dictionary are local, oscillatory functions concentrated 
approximately on dyadic supports and appear at a discrete collection of scales, locations, and (if the signal 
dimension D > 1) orientations. 



2.3.1 Scale 

In wavelet analysis and other settings, we will frequently refer to a particular scale of analysis for a sig- 
nal. Consider, for example, continuous-time functions / defined over the domain T> = [0, 1] . A dyadic 



hypercube Xj C [0, 1] at scale j s N is a domain that satisfies 



X, 



[/Ji2-'', (/3i + 1) 2-q x ••• x [/3 D 2-^, (/? D + 1) 2" 



(2.5) 



with /?i, ^2> ' • ' i /3_d € {0, 1, • ■ • , 2 J — 1}. We call Xj a dyadic interval when D = 1 or a dyadic square 
when D = 2 (see Figure 2.2). Note that X,- has sidelength 2~ 3 . 




Figure 2.2: Dyadic partitioning of the unit square at scales j = 0,1,2. The partitioning induces a 
coarse-to-fine parent/child relationship that can be modeled using a tree structure. 



For discrete-time functions the notion of scale is similar. We can imagine, for example, a "voxelization" 

D tu__;___1;__L:—V> _.l T-i n\ ___1 1 1 1 !J_1 lL r,-B 



of the domain [0, 1] 
2 BD voxels to fill [0, 1] D 



("pixelization" when D = 2), where each voxel has sidelength 2 , B G N, and it takes 
The relevant scales of analysis for such a signal would simply be j = 0, 1, • • • , B, 



and each dyadic hypercube Xj would refer to a collection of voxels. 



2,3.2 Wavelet fundamentals 

The wavelet transform offers a multiscale decomposition of a function into a nested sequence of scaling spaces 
Vb C V\ C • • • C Vj C • • ■. Each scaling space Vj is spanned by a discrete collection of dyadic translations of 
a lowpass scaling function <pj, and the difference between adjacent scaling spaces Vj and V,+i is spanned by 
a discrete collection of dyadic translations of a bandpass wavelet function tpj. Figure 2.3 shows an example 
of this multiscale organization in the case of the Haar wavelet dictionary. Each wavelet function at scale j is 
concentrated approximately on some dyadic hypercube Xj, and between scales, both the wavelets and scaling 
functions are "self-similar," differing only by rescaling and dyadic dilation. When D > 1, the difference spaces 
are partitioned into 2 D — 1 distinct orientations (when D = 2 these correspond to vertical, horizontal, and 
diagonal directions). The wavelet transform can be truncated at any scale j. We then let the basis ^ consist 
of all scaling functions at scale j plus all wavelets at scales j and finer. 
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Figure 2.3: Multiscale wavelet representations on the interval [0, 1]. (a) Haar scaling functions spanning 
Vj with j — 2. (b) Haar wavelet functions spanning the difference space between Vj and Vj + i. (c) Haar 
scaling functions spanning Vj + i. (d) Two example functions belonging to the spaces (left) Vj and (right) 
V i+ i. 



Wavelets are essentially bandpass functions that detect abrupt changes in a signal. The scale of a 
wavelet, which controls its support both in time and in frequency, also controls its sensitivity to changes in 
the signal. This is made more precise by considering the wavelet analysis of smooth signals. Wavelet are 
often characterized by their number of vanishing moments; a wavelet basis function is said to have H 
vanishing moments if it is orthogonal to (its inner product is zero against) any H-Aegree polynomial. Sparse 
(Nonlinear) models (Section 4.2: Sparse (nonlinear) models) discusses further the wavelet analysis of smooth 
and piecewise smooth signals. 

The dyadic organization of the wavelet transform lends itself to a multiscale, tree-structured organization 
of the wavelet coefficients. Each "parent" function, concentrated on a dyadic hypercube Xj of sidelength 2~ : > , 
has 2 D "children" whose supports are concentrated on the dyadic subdivisions of Xj. This relationship can 
be represented in a top-down tree structure, as demonstrated in Figure 2.2. Because the parent and children 
share a location, they will presumably measure related phenomena about the signal, and so in general, any 
patterns in their wavelet coefficients tend to be reflected in the connectivity of the tree structure. Figure 2.4 
and Figure 2.5 show an example of the wavelet transform applied to the Cameraman test image; since the 
dimension D = 2, each scale is partitioned into vertical, horizontal, and diagonal wavelet analysis, and each 
parent coefficient has 2 D = 4 children. 




Figure 2.4: Cameraman test image (size 256 x 256) for use in wavelet decomposition and approximation 
examples. 



CHAPTER 2. SIGNAL DICTIONARIES AND REPRESENTATIONS 






150 200 250 



(b) 

Figure 2.5: Wavelet analysis of the Cameraman test image, (a) One-level wavelet transform, where the 
iV-pixel image is transformed into four sets of N/4 coefficients each. The top left quadrant represents 
the scaling coefficients at the next coarser scale (relative to the scale of pixelization). The remaining 
quadrants represent the wavelet coefficients from the difference spaces, partitioned into the vertical, 
horizontal, and diagonal subbands. (b) Three-level wavelet transform, where the wavelet decomposition 
has been iterated twice more on the scaling coefficients. The multiple scales of wavelet coefficients exhibit 
a parent-child dependency. The largest coefficients tend to concentrate at the coarsest scales and around 
high-frequency features such as edges in the image. 



In addition to their ease of modeling, wavelets are computationally attractive for signal processing; using 
a filter bank, the wavelet transform of an iV-voxel signal can be computed in just O (N) operations. 

2.4 Other dictionaries 



A wide variety of other dictionaries have been proposed in signal processing and harmonic analysis. As 
one example, complex- valued wavelet transforms have proven useful for image analysis and modeling [72], 
[73], [94], [65], [102], [91], [66], thanks to a phase component that captures location information at each 



9 

scale. Just a few of the other harmonic dictionaries popular in image processing include wavelet packets [80], 
Gabor atoms [80], curvelets [29], [18], and contourlets [50], [51], all of which involve various space-frequency 
partitions. We mention additional dictionaries in Compression (Chapter 6) . 
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Chapter 3 
Manifolds' 



As we will soon discuss, manifold models can provide an alternative to signal dictionaries as a framework 
for concise signal modeling. In this module, we present a minimal set of definitions and terminology from 
differential geometry and topology that serve as an introduction to manifolds. We refer the reader to the 
introductory and classical texts [90], [86], [68], [11] for more depth and technical precision. 

3.1 General terminology 

A X-dimensional manifold M. is a topological space 2 that is locally homeomorphic 3 to R'*' [68]. This means 
that there exists an open cover of M with each such open set mapping homeomorphically to an open ball 
in WL K . Each such open set, together with its mapping to R K is called a chart; the set of all charts of a 
manifold is called an atlas. 

The general definition of a manifold makes no reference to an ambient space in which the manifold lives. 
However, as we will often be making use of manifolds as models for sets of signals, it follows that such "signal 
manifolds" are actually subsets of some larger space (for example, of L-i (R) or WL N ). In general, we may 
think of a if-dimensional submanifold embedded in R N as a nonlinear, fT-dimensional "surface" within WL N . 

3.2 Examples of manifolds 

One of the simplest examples of a manifold is simply the circle in I 2 . A small, open-ended segment cut 
from the circle could be stretched out and associated with an open interval of the real line (see Figure 3.1). 
Hence, the circle is a 1-D manifold. (We note that at least two charts are required to form an atlas for the 
circle, as the entire circle itself cannot be mapped homeomorphically to an open interval in R 1 .) 



1 This content is available online at <http://cnx.Org/content/ml8722/l.4/>. 

2 A topological space is simply a set X, together with a collection T of subsets of X called open sets, such that: (i) the 
empty set belongs to T, (ii) X belongs to T, (iii) arbitrary unions of elements of T belong to T, and (iv) finite intersections of 
elements of T belong to T. 

3 A homeomorphism is a function between two topological spaces that is one-to-one, onto, continuous, and has a continuous 
inverse. 
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Figure 3.1: A circle is a manifold because there exists an open cover consisting of the sets Ui, U2, which 
are mapped homeomorphically onto open intervals in the real line via the functions ipi,ip2. (It is not 
necessary that the intervals intersect in R.) 



We refer the reader to [92] for an excellent overview of several manifolds with relevance to signal process- 
ing, including the rotation group SO (3), which can be used for representing orientations of objects in 3-D 
space, and the Grassman manifold G (K, TV), which represents all K-dimensional subspaces of M. N . (Without 
working through the technicalities of the definition of a manifold, it is easy to see that both types of data 
have a natural notion of neighborhood.) 

3.3 Tangent spaces 

A manifold is differentiable if, for any two charts whose open sets on M. overlap, the composition of the 
corresponding homeomorphisms (from R K in one chart to M. and back to R^ in the other) is differentiable. 
(In our simple example, the circle is a differentiable manifold.) 

To each point a; in a differentiable manifold, we may associate a ii'-dimensional tangent space Tan^. 
For signal manifolds embedded in L 2 or R*, it suffices to think of Tan^. as the set of all directional derivatives 
of smooth paths on M. through x. (Note that Tan^ is a linear subspace and has its origin at 0, rather than 
at x.) 

3.4 Distances 

One is often interested in measuring distance along a manifold. For abstract differentiable manifolds, this 
can be accomplished by defining a Riemannian metric on the tangent spaces. A Riemannian metric is a 
collection of inner products <, > x defined at each point x € M.. The inner product gives a measure for 
the "length" of a tangent, and one can then compute the length of a path on M. by integrating its tangent 
lengths along the path. 

For differentiable manifolds embedded in R N , the natural metric is the Euclidean metric inherited from 
the ambient space. The length of a path 7 : [0, 1] 1— » M. can then be computed simply using the limit 



3 
3— y °° 



length (7) = Urn £ || 7 {%/]) - 7 ((» " 1) H) ll 2 - t 3 - 1 ) 



The geodesic distance d-M (%, v) between two points x, y € M is then given by the length of the shortest 
path 7 on M. joining x and y. 
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3.5 Condition number 

To establish a firm footing for analysis, we find it helpful assume a certain regularity to the manifold beyond 
mere differentiability. For this purpose, we adopt the condition number defined recently by Niyogi et al. [87]. 

Definition 3.1: 

[87] Let Al be a compact submanifold of B. N . The condition number of M. is defined as 1/r, 
where r is the largest number having the following property: The open normal bundle about M. of 
radius r is imbedded in R N for all r < t. 

The open normal bundle of radius r at a point x € M. is simply the collection of all vectors of length < r 
anchored at x and with direction orthogonal to Tan^.. 

In addition to controlling local properties (such as curvature) of the manifold, the condition number has 
a global effect as well, ensuring that the manifold is self-avoiding. These notions are made precise in several 
lemmata, which we repeat below for completeness. 

Lemma 3.1: 

[87] If A! is a submanifold of M. N with condition number 1/r, then the norm of the second 
fundamental form is bounded by 1/t in all directions. 

This implies that unit-speed geodesic paths on Ai have curvature bounded by 1/r. The second lemma 
concerns the twisting of tangent spaces. 

Lemma 3.2: 

[87] Let M. be a submanifold of R w with condition number 1/t. Let p, q s M. be two points with 
geodesic distance given by djvi {p, q)- Let 9 be the angle between the tangent spaces Tan p and Tan 9 
defined by cos (6) = min u€Tilnp max v( z Tanq | < u, v > |. Then cos (9) > 1 - \d M (p, q). 

The third lemma concerns self-avoidance of M.. 

Lemma 3.3: 

[87] Let Al be a submanifold of R* with condition number 1/r. Let p, q e M be two points 
such that \\p — q\\ 2 = d. Then for all d < r/2, the geodesic distance dM{p,l) ls bounded by 
dM (p, q) <t - t-sJX - 2d/r. 
From Lemma 3.3, p. 13 we have an immediate corollary. 

Corollary 3.1: 

Let Al be a submanifold of WL N with condition number 1/r. Let p, q e M. be two points such that 
||p - g|| 2 = d.\id< r/2, then d > d M (p, q) - ^half_ 
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Chapter 4 

Low-Dimensional Signal Models 1 



We now survey some common and important models in signal processing, each of which involves some notion 
of conciseness to the signal structure. We see in each case that this conciseness gives rise to a low-dimensional 
geometry within the ambient signal space. 

4.1 Linear models 

Some of the simplest models in signal processing correspond to linear subspaces of the ambient signal 
space. Bandlimited signals are one such example. Supposing, for example, that a 2-7r-periodic signal / has 
Fourier transform F {to) = for \u>\ > B, the Shannon/Nyquist sampling theorem [81] states that such signals 
can be reconstructed from 2B samples. Because the space of _B-bandlimited signals is closed under addition 
and scalar multiplication, it follows that the set of such signals forms a 2_B-dimensional linear subspace of 
L 2 ([0,2n)). 

Linear signal models also appear in cases where a model dictates a linear constraint on a signal. 
Considering a discrete length- N signal x, for example, such a constraint can be written in matrix form as 

Ax = (4.1) 

for some M x N matrix A. Signals obeying such a model are constrained to live in Af (A) (again, obviously, 
a linear subspace of R N ). 

A very similar class of models concerns signals living in an affine space, which can be represented for a 
discrete signal using 

Ax = y. (4.2) 

The class of such x lives in a shifted nullspace x +Af (A), where x is any solution to the equation A x= y. 

Revisiting the dictionary setting (see Signal Dictionaries and Representations (Chapter 2)), one last 

important linear model arises in cases where we select K specific elements from the dictionary ^ and then 

construct signals using linear combinations of only these K elements; in this case the set of possible signals 

forms a K-dimensional hyperplane in the ambient signal space (see Figure 4.1(a)). 



1 This content is available online at <http://cnx.Org/content/ml8726/l.4/>. 
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x(2)> 




Figure 4.1: Simple models for signals in M 2 . (a) The linear space spanned by one element of the 
dictionary ty . The bold vectors denote the elements of the dictionary, while the dashed line (plus the 
corresponding dictionary element) denotes the subspace spanned by that dictionary element, (b) The 
nonlinear set of 1-sparse signals that can be built using 9. (c) A manifold M. 



For example, we may construct low-frequency signals using combinations of only the lowest frequency si- 
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nusoids from the Fourier dictionary. Similar subsets may be chosen from the wavelet dictionary; in particular, 
one may choose only elements that span a particular scaling space Vj. As we have mentioned previously, har- 
monic dictionaries such as sinusoids and wavelets are well-suited to representing smooth 2 signals. This can be 
seen in the decay of their transform coefficients. For example, we can relate the smoothness of a continuous 
1-D function / to the decay of its Fourier coefficients F (w); in particular, if J \F (w) | (l + \w\ H ) dcj < oo, 
then / g C H [81]. In order to satisfy J \F (to) | (l + \w\ H ) duo < oo, a signal must have a sufficiently fast 
decay of the Fourier transform coefficients \F (ui) | as u> grows. Wavelet coefficients exhibit a similar decay 
for smooth signals: supposing / s C H and the wavelet basis function has at least H vanishing moments, 
then as the scale j — > oo, the magnitudes of the wavelet coefficients decay as 2^^ H+1/>2 " > [81]. (Recall that 
/ g C H implies / is well-approximated by a polynomial, and so due the vanishing moments this polynomial 
will have zero contribution to the wavelet coefficients.) 

Indeed, these results suggest that the largest Fourier or wavelet coefficients of smooth signals tend to 
concentrate at the coarsest scales (lowest-frequencies). In Linear Approximation from Approximation (Sec- 
tion 5.1: Linear approximation) , we see that linear approximations formed from just the lowest frequency 
elements of the Fourier or wavelet dictionaries (i.e., the truncation of the Fourier or wavelet representation 
to only the lowest frequency terms) provide very accurate approximations to smooth signals. Put differently, 
smooth signals live near the subspace spanned by just the lowest frequency Fourier or wavelet basis functions. 

4.2 Sparse (nonlinear) models 

Sparse signal models can be viewed as a generalization of linear models. The notion of sparsity comes from 
the fact that, by the proper choice of dictionary *£, many real- world signals x = ^a have coefficient vectors 
a containing few large entries, but across different signals the locations (indices in a) of the large entries 
may change. We say a signal is strictly sparse (or "K-sparse") if all but K entries of a are zero. 

Some examples of real-world signals for which sparse models have been proposed include neural spike 
trains (in time), music and other audio recordings (in time and frequency), natural images (in the wavelet 
or curvelet dictionaries [81], [49], [96], [78], [107], [59], [31], [19]), video sequences (in a 3-D wavelet dictio- 
nary [85], [95]), and sonar or radar pulses (in a chirplet dictionary [5]). In each of these cases, the relevant 
information in a sparse representation of a signal is encoded in both the locations (indices) of the significant 
coefficients and the values to which they are assigned. This type of uncertainty is an appropriate model for 
many natural signals with punctuated phenomena. 

Sparsity is a nonlinear model. In particular, let Y±k denote the set of all if-sparse signals for a given 
dictionary. It is easy to see that the set S^ is not closed under addition. (In fact, £#- + £#• = T,2K-) From 
a geometric perspective, the set of all if-sparse signals from the dictionary ^ forms not a hyperplane but 
rather a union of i^-dimensional hyperplanes, each spanned by K vectors of ^ (see Figure 4.1(b)). For a 

dictionary ^ with Z entries, there are I K j such hyperplanes. (The geometry of sparse signal collections 

has also been described in terms of orthosymmetric sets; see [58].) 

Signals that are not strictly sparse but rather have a few "large" and many "small" coefficients are known 
as compressible signals. The notion of compressibility can be made more precise by considering the rate 
at which the sorted magnitudes of the coefficients a decay, and this decay rate can in turn be related to the 
£ p norm of the coefficient vector a. Letting a denote a rearrangement of the vector a with the coefficients 



2 Lipschitz smoothness We say a continuous-time function of D variables has smoothness of order H > 0, where H = r + v, 
r is an integer, and v £ (0, 1], if the following criteria are met [81], [49]: 

• All iterated partial derivatives with respect to the D directions up to order r exist and are continuous. 

• All such partial derivatives of order r satisfy a Lipschitz condition of order v (also known as a Holder condition). (A 
function d g Lip (v) if \d{t\ + t 2 ) — d(t\)\ < C \\ t 2 \\ v for all D-dimensional vectors t\, t 2 .) 

We will sometimes consider the space of smooth functions whose partial derivatives up to order r are bounded by some constant 
Q. With somewhat nonstandard notation, we denote the space of such bounded functions with bounded partial derivatives by 
C H , where this notation carries an implicit dependence on fl. Observe that r = \H — 1], where [■] denotes rounding up. Also, 
when H is an integer C H includes as a subset the space traditionally denoted by the notation "C H " (the class of functions that 
have H = r + 1 continuous partial derivatives). 
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ordered in terms of decreasing magnitude, then the reordered coefficients satisfy [46] 

a k < || a \\ lp k- 1/p . (4.3) 

As we discuss in Nonlinear Approximation from Approximation (Section 5.2: Nonlinear approximation), 
these decay rates play an important role in nonlinear approximation, where adaptive, _ftT-sparse repre- 
sentations from the dictionary are used to approximate a signal. 

We recall from Section 4.1 (Linear models) that for a smooth signal /, the largest Fourier and wavelet 
coefficients tend to cluster at coarse scales (low frequencies). Suppose, however, that the function / is piece- 
wise smooth; i.e., it is C H at every point t£l except for one point to, at which it is discontinuous. Naturally, 
this phenomenon will be reflected in the transform coefficients. In the Fourier domain, this discontinuity will 
have a global effect, as the overall smoothness of the function / has been reduced dramatically from H to 
0. Wavelet coefficients, however, depend only on local signal properties, and so the wavelet basis functions 
whose supports do not include to will be unaffected by the discontinuity. Coefficients surrounding the sin- 
gularity will decay only as 2~^ 2 , but there are relatively few such coefficients. Indeed, at each scale there 
are only O (1) wavelets that include to in their supports, but these locations are highly signal-dependent. 
(For modeling purposes, these significant coefficients will persist through scale down the parent-child tree 
structure.) After reordering by magnitude, the wavelet coefficients of piecewise smooth signals will have the 
same general decay rate as those of smooth signals. In Nonlinear Approximation from Approximation (Sec- 
tion 5.2: Nonlinear approximation), we see that the quality of nonlinear approximations offered by wavelets 
for smooth 1-D signals is not hampered by the addition of a finite number of discontinuities. 

4.3 Manifold models 

Manifold models generalize the conciseness of sparsity-based signal models. In particular, in many situations 
where a signal is believed to have a concise description or "few degrees of freedom," the result is that the 
signal will live on or near a particular submanifold of the ambient signal space. 

4,3.1 Parametric models 

We begin with an abstract motivation for the manifold perspective. Consider a signal / (such as a natural 
image), and suppose that we can identify some single 1-D piece of information about that signal that could 
be variable; that is, other signals might rightly be called "similar" to / if they differ only in this piece of 
information. (For example, this 1-D parameter could denote the distance from some object in an image to 
the camera.) We let 9 denote the variable parameter and write the signal as fg to denote its dependence 
on 9. In a sense, 9 is a single "degree of freedom" driving the generation of the signal fg under this simple 
model. We let O denote the set of possible values of the parameter 9. If the mapping between 9 and fg is 
well-behaved, then the collection of signals {fg : 9 e 0} forms a 1-D path in the ambient signal space. 

More generally, when a signal has K degrees of freedom, we may model it as depending on some parameter 
9 that is chosen from a if-dimensional manifold 0. (The parameter space could be, for example, a subset of 
R K , or it could be a more general manifold such as SO (3).) We again let fg denote the signal corresponding 
to a particular choice of 9, and we let T = {fg : 9 e 0}. Assuming the mapping / is continuous and 
injective over (and its inverse is continuous), then by virtue of the manifold structure of 0, its image T 
will correspond to a K-dimensional manifold embedded in the ambient signal space (see Figure 4.1(c)). 

These types of parametric models arise in a number of scenarios in signal processing. Examples include: 
signals of unknown translation, sinusoids of unknown frequency (across a continuum of possibilities), linear 
radar chirps described by a starting and ending time and frequency, tomographic or light field images with 
articulated camera positions, robotic systems with few physical degrees of freedom, dynamical systems with 
low-dimensional attractors [13], [15], and so on. 

In general, parametric signals manifolds are nonlinear (by which we mean non-affine as well); this can 
again be seen by considering the sum of two signals fg + fg 1 . In many interesting situations, signal manifolds 
are non-differentiable as well. 
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4.3.2 Nonparametric models 

Manifolds have also been used to model signals for which there is no known parametric model. Examples 
include images of faces and handwritten digits [101], [9], which have been found empirically to cluster near 
low-dimensional manifolds. Intuitively, because of the configurations of human joints and muscles, it may be 
conceivable that there are relatively "few" degrees of freedom driving the appearance of a human face or the 
style of handwriting; however, this inclination is difficult or impossible to make precise. Nonetheless, certain 
applications in face and handwriting recognition have benefitted from algorithms designed to discover and 
exploit the nonlinear manifold- like structure of signal collections. Manifold Learning from Dimensionality 
Reduction (Section 7.1: Manifold learning) discusses such methods for learning parametrizations and other 
information from data living along manifolds. 

Much more generally, one may consider, for example, the set of all natural images. Clearly, this set has 
small volume with respect to the ambient signal space — generating an image randomly pixel-by-pixel will 
almost certainly produce an unnatural noise-like image. Again, it is conceivable that, at least locally, this 
set may have a low-dimensional manifold-like structure: from a given image, one may be able to identify 
only a limited number of meaningful changes that could be performed while still preserving the natural look 
to the image. Arguably, most work in signal modeling could be interpreted in some way as a search for this 
overall structure. 
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Chapter 5 

Approximation 1 



To this point, we have discussed signal representations and models as basic tools for signal processing. In 
the following modules, we discuss the actual application of these tools to tasks such as approximation and 
compression, and we continue to discuss the geometric implications. 

5.1 Linear approximation 

One common prototypical problem in signal processing is to find the best linear approximation to a signal 
x. By "best linear approximation," we mean the best approximation to x from among a class of signals 
comprising a linear (or affine) subspace. This situation may arise, for example, when we have a noisy 
observation of a signal believed to obey a linear model. If we choose an £2 error criterion, the solution to 
this optimization problem has a particularly strong geometric interpretation. 

To be more concrete, suppose S is a /if-dimensional linear subspace of WL N . (The case of an affine subspace 
follows similarly.) If we seek 

s* := argmin\\s — x\\ 2 , (5-1) 

sES 

standard linear algebra results state that the minimizer is given by 

s* = A T Ax, (5.2) 

where A is a K x N matrix whose rows form an orthonormal basis for S. Geometrically, one can easily see 
that this solution corresponds to an orthogonal projection of x onto the subspace S (see Figure 5.1(a)). 



1 This content is available online at <http://cnx.Org/content/ml8727/l.5/>. 
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x(2)> 




Figure 5.1: Approximating a signal x £ R with an 1% error criterion, (a) Linear approximation 
using one element of the dictionary 9 corresponds to orthogonal projection of the signal onto the linear 
subspace. (b) Nonlinear approximation corresponds to orthogonal projection of the signal onto the 
nearest candidate subspace. In this case, we choose the best 1-sparse signal that can be built using $. 
(c) Manifold-based approximation, finding the nearest point on M. 
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The linear approximation problem arises frequently in settings involving signal dictionaries. In some 
settings, such as the case of an oversampled bandlimited signal, certain coefficients in the vector a may be 
assumed to be fixed at zero. In the case where the dictionary ^ forms an orthonormal basis, the linear 
approximation estimate of the unknown coefficients has a particularly simple form: rows of the matrix A in 
(5.2) are obtained by selecting and transposing the columns of ^> whose expansion coefficients are unknown, 
and consequently, the unknown coefficients can be estimated simply by taking the inner products of x against 
the appropriate columns of ^ . 

For example, in choosing a fixed subset of the Fourier or wavelet dictionaries, one may rightfully choose 
the lowest frequency (coarsest scale) basis functions for the set S because, as discussed in Linear Models 
from Low-Dimensional Signal Models (Section 4.1: Linear models) , the coefficients generally tend to decay 
at higher frequencies (finer scales). For smooth functions, this strategy is appropriate and effective; functions 
in Sobolev smoothness spaces are well-approximated using linear approximations from the Fourier or wavelet 
dictionaries [82]. For piecewise smooth functions, however, even the wavelet-domain linear approximation 
strategy would miss out on significant coefficients at fine scales. Since the locations of such coefficients 
are unknown a priority, it is impossible to propose a linear wavelet-domain approximation scheme that 
could simultaneously capture all piecewise smooth signals. As an example, Figure 5.2(a) shows the linear 
approximation of the Cameraman test image obtained by keeping only the lowest-frequency scaling and 
wavelet coefficients. No high-frequency information is available to clearly represent features such as edges. 
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Figure 5.2: Linear versus nonlinear approximation in the wavelet domain, (a) Linear approximation 
of the Cameraman test image obtained by keeping the K — 4096 lowest-frequency wavelet coefficients 
from the five-level wavelet decomposition. The MSE with respect to the original image is 353. (b) 
Nonlinear approximation of the Cameraman test image obtained by keeping the K — 4096 largest 
wavelet coefficients from the five-level wavelet decomposition. The MSE with respect to the original 
image is 72. Compared with linear approximation, more high frequency coefficients are included, which 
allows better representation of features such as edges. 



5.2 Nonlinear approximation 



A related question often arises in settings involving signal dictionaries. Rather than finding the best ap- 
proximation to a signal / using a fixed collection of K elements from the dictionary *£, one may often seek 
the best if-term representation to / among all possible expansions that use K terms from the dictionary. 
Compared to linear approximation, this type of nonlinear approximation [45], [39] utilizes the ability of the 
dictionary to adapt: different elements may be important for representing different signals. 
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The K-term nonlinear approximation problem corresponds to the optimization 

s k P : = arg min \\ s-f ||. (5.3) 

(For the sake of generality, we consider general L p and £ p norms in this section.) Due to the nonlinearity of 
the set Y>k for a given dictionary, solving this problem can be difficult. Supposing ^ is an orthonormal basis 
and p = 2, the solution to (5.3) is easily obtained by thresholding: one simply computes the coefficients a 
and keeps the K largest (setting the remaining coefficients to zero). The approximation error is then given 
simply by the coefficients that are discarded: 

11^,2-/112= (£ & fc) ' ( 5 - 4 ) 

\k>K J 

When $ is a redundant dictionary, however, the situation is much more complicated. We mention more on 
this below (see also Figure 5.1(b)). 

5.2.1 Measuring approximation quality 

One common measure for the quality of a dictionary ^ in approximating a signal class is the fidelity of its 
K-term representations. Often one examines the asymptotic rate of decay of the K-term approximation 
error as K grows large. Defining 

"*(/)„ : = II s* K , p - f \\ p , (5.5) 

for a given signal / we may consider the asymptotic decay of <tr-(/),, as K — > oo. (We recall the dependence 
of (5.3) and hence (5.5) on the dictionary ^ .) In many cases, the function <JK{f) p will decay as K~ r for 
some r, and when ^ represents a harmonic dictionary, faster decay rates tend to correspond to smoother 
functions. Indeed, one can show that when ^ is an orthonormal basis, then <trt(/) 2 will decay as K~ r if and 
only if dfc decays as fc _r+1 / 2 [47]. 

5.2.2 Nonlinear approximation of piecewise smooth functions 

Let / G C H be a 1-D function. Supposing the wavelet dictionary has more than H vanishing moments, then 
/ can be well approximated using its K largest coefficients (most of which are at coarse scales) . As K grows 
large, the nonlinear approximation error will decay 2 as cr^ (/) 2 < K~ H . 

Supposing that / is piecewise smooth, however, with a finite number of discontinuities, then (as discussed 
in Sparse (Nonlinear) Models from Low-Dimensional Signal Models (Section 4.2: Sparse (nonlinear) models)) 
/ will have a limited number of significant wavelet coefficients at fine scales. Because of the concentration of 
these significant coefficients within each scale, the nonlinear approximation rate will remain <tk-(/) 2 < K~ H 
as if there were no discontinuities present [82] . 

Unfortunately, this resilience of wavelets to discontinuities does not extend to higher dimensions. Suppose, 
for example, that / is a C H smooth 2-D signal. Assuming the proper number of vanishing moments, a 
wavelet representation will achieve the optimal nonlinear approximation rate aK{f)2 ^ K~ H / 2 [37], [82]. 
As in the 1-D case, this approximation rate is maintained when a finite number of point discontinuities are 
introduced into /. However, when / contains 1-D discontinuities (edges separating the smooth regions), the 
approximation rate will fall to CT_ff(/) 2 < K~ x / 2 [82]. The problem actually arises due to the isotropic, dyadic 
supports of the wavelets; instead of O (1) significant wavelets at each scale, there are now O (2 J ) wavelets 
overlapping the discontinuity. We revisit this important issue in Compression (Chapter 6). 

Despite the limited approximation capabilities for images with edges, nonlinear approximation in the 
wavelet domain typically offers a superior approximation to an image compared to linear approximation in 
the wavelet domain. As an example, Figure 5.2(b) shows the nonlinear approximation of the Cameraman 



2 We use the notation / (a) < g (a), or / (a) = O (g (a)), if there exists a constant C, possibly large but not dependent on 
the argument a, such that / (a) < Cg (a). 
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test image obtained by keeping the largest scaling and wavelet coefficients. In this case, a number of 
high-frequency coefficients are selected, which gives an improved ability to represent features such as edges. 
Better concise transforms, which capture the image information in even fewer coefficients, would offer further 
improvements in terms of nonlinear approximation quality. 

5.2,3 Finding approximations 

As mentioned above, in the case where ^ is an orthonormal basis and p = 2, the solution to (5.3) is easily 
obtained by thresholding: one simply computes the coefficients a and keeps the K largest (setting the 
remaining coefficients to zero). Thresholding can also be shown to be optimal for arbitrary £ p norms in 
the special case where ^ is the canonical basis. While the optimality of thresholding does not generalize 
to arbitrary norms and bases, thresholding can be shown to be a near-optimal approximation strategy for 
wavelet bases with arbitrary L p norms [47]. 

In the case where ^ is a redundant dictionary, however, the expansion coefficients a are not unique, and 
the optimization problem (5.3) can be much more difficult to solve. Indeed, supposing even that an exact 
K-term representation exists for / in the dictionary ^, finding that if -term approximation is NP-hard in 
general, requiring a combinatorial enumeration of the I K ) possible sparse subspaces [26]. This search can 
be recast as the optimization problem 

a= argmin\\ a \\ Q s.t. / = ^>a. (5-6) 

While solving (5.6) is prohibitively complex, a variety of algorithms have been proposed as alternatives. 
One approach convexities the optimization problem by replacing the £ fidelity criterion by an £\ criterion 

a= argmin\\ a \\ x s.t. / = ^a. (5-7) 

This problem, known as Basis Pursuit [34], is significantly more approachable and can be solved with 
traditional linear programming techniques whose computational complexities are polynomial in Z. The £\ 
criterion has the advantage of yielding a convex optimization problem while still encouraging sparse solutions 
due to the polytope geometry of the £\ unit ball (see for example [55] and [61]). Iterative greedy algorithms 
such as Matching Pursuit (MP) and Orthogonal Matching Pursuit (OMP) [82] have also been suggested 
to find sparse representations a for a signal /. Both MP and OMP iteratively select the columns from ^ 
that are most correlated with /, then subtract the contribution of each column, leaving a residual. OMP 
includes an additional step at each iteration where the residual is orthogonalized against the previously 
selected columns. 

5.3 Manifold approximation 

We also consider the problem of finding the best manifold-based approximation to a signal (see Figure 5.1(c)). 
Suppose that T = {fg : 9 s 0} is a parametrized ff-dimension manifold and that we are given a signal I that 
is believed to approximate fg for an unknown € 8. From I we wish to recover an estimate of 9. Again, we 
may formulate this parameter estimation problem as an optimization, writing the objective function (here 
we concentrate solely on the Li or £2 case) 

D(0) = \\f e -I\\ 2 2 (5.8) 

and solving for 

9* = argminD (9) . (5-9) 

We suppose that the minimum is uniquely defined. 
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Standard nonlinear parameter estimation [8] tells us that, if D is differentiable, we can use Newton's 
method to iteratively refine a sequence of guesses 9^°\ 9^\ 9^ 2 \ ■ ■ ■ to 6* and rapidly convergence to the true 
value. Supposing that !F is a differentiable manifold, we would let 

J = [dD/d9 Q dD/d9 1 ■ ■ ■ dD/d9 K _ 1 } T (5.10) 

be the gradient of D, and let H be the K x K Hessian, Hij = gf-Jg-- Assuming D is differentiable, Newton's 
method specifies the following update step: 

0(fc+i) <_ e (k) + \ H Uk)X\ - 1 j Uk)\ (5 n) 

To relate this method to the structure of the manifold, we can actually express the gradient and Hessian in 
terms of signals, writing 

D (0) =|| fe - I |||= J (fe - if dx = J f 6 - 21 f e + I 2 dx. (5.12) 

Differentiating with respect to component 9i, we obtain 

dD _ j 

- f JL ( f2\ _ OT-d-f„ Arr 

(5.13) 



Of, 



i) 

10, 


Uti- 


21 fe + I 2 


dx) 


I 


M~ \fe) 


~ 2I M~fe 


dx 




IVerl 


- 2Ir l g dx 






2<fe 


-i,4>, 





where T % e = -^ is a tangent signal. Continuing, we examine the Hessian, 



d 2 D _ tt _ _§_ I dD 
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I M- ( 2 f<>4 ~ 2lT l e ) dx 

j2Ty e + 2f $ T i e j -2iT; j dx 

2<Tlri>+2<f e -I,Ti i >, 



(5.14) 



where Tq = ag .g S . denotes a second-derivative signal. Thus, we can interpret Newton's method geometri- 
cally as (essentially) a sequence of successive projections onto tangent spaces on the manifold. 

Again, the above discussion assumes the manifold to be differentiable. Many interesting parametric signal 
manifolds are in fact nowhere differentiable — the tangent spaces demanded by Newton's method do not 
exist. However, in [105] we have identified a type of multiscale tangent structure to the manifold that permits 
a coarse-to-fine technique for parameter estimation. 
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Chapter 6 

Compression 1 



6.1 Transform coding 

In Nonlinear Approximation from Approximation (Section 5.2: Nonlinear approximation), we measured the 
quality of a dictionary in terms of its if-term approximations to signals drawn from some class. One reason 
that such approximations are desirable is that they provide concise descriptions of the signal that can be 
easily stored, processed, etc. There is even speculation and evidence that neurons in the human visual system 
may use sparse coding to represent a scene [89]. 

For data compression, conciseness is often exploited in a popular technique known as transform coding. 
Given a signal / (for which a concise description may not be readily apparent in its native domain), the 
idea is simply to use the dictionary ^ to transform / to its coefficients a, which can then be efficiently and 
easily described. As discussed above, perhaps the simplest strategy for summarizing a sparse a is simply 
to threshold, keeping the K largest coefficients and discarding the rest. A simple encoder would then just 
encode the positions and quantized values of these K coefficients. 

6.2 Metric entropy 

Suppose / is a function and let Jr be an approximation to / encoded using R bits. To evaluate the quality 
of a coding strategy, it is common to consider the asymptotic rate-distortion (R-D) performance, which 

measures the decay rate of || /— fn\\ L as R — > oo. The metric entropy [74] for a class T gives the best 
decay rate that can be achieved uniformly over all functions / e T. We note that this is a true measure for 
the complexity of a class and is tied to no particular dictionary or encoding strategy. The metric entropy 
also has a very geometric interpretation, as it relates to the smallest radius possible for a covering of 2 R balls 
over the set T. 

Metric entropies are known for certain signal classes. For example, the results of Clements [36] (extending 
those of Kolmogorov and Tihomirov [74]) regarding metric entropy give bounds on the optimal achievable 
asymptotic rate-distortion performance for D-dimensional C ff -smooth functions / (see also [38]): 

ll/-/«ll ip <(^) D - (6-1) 

Rate-distortion performance measures the complexity of a representation and encoding strategy. In the 
case of transform coding, for example, R-D results account for the bits required to encode both the values 
of the significant coefficients and their locations. Nonetheless, in many cases transform coding is indeed an 



1 This content is available online at <http://cnx.Org/content/ml8729/l.3/>. 
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effective strategy for encoding signals that have sparse representations [40]. For example, in [38] Cohen et 
al. propose a wavelet-domain coder that uses a connected-tree structure to efficiently encode the positions 
of the significant coefficients and prove that this encoding strategy achieves the optimal rate 

A H_ 

\\f-fn\\ Lp <(^Y- (6-2) 



6.3 Compression of piecewise smooth images 

In some cases, however, the sparsity of the wavelet transform may not reflect the true underlying structure 
of a signal. Examples are 2-D piecewise smooth signals with a smooth edge discontinuity separating the 
smooth regions. As we discussed in Nonlinear Approximation from Approximation (Section 5.2: Nonlinear 
approximation), wavelets fail to sparsely represent these functions, and so the R-D performance for simple 
thresholding-based coders will suffer as well. In spite of all of the benefits of wavelet representations for 
signal processing (low computational complexity, tree structure, sparse approximations for smooth signals), 
this failure to efficiently represent edges is a significant drawback. In many images, edges carry some of the 
most prominent and important information [84], and so it is desirable to have a representation well-suited 
to compressing edges in images. 

To address this concern, recent work in harmonic analysis has focused on developing representations 
that provide sparse decompositions for certain geometric image classes. Examples include curvelets [30], [20] 
and contourlets [52], slightly redundant tight frames consisting of anisotropic, "needle- like" atoms. In [77], 
bandelets are formed by warping an orthonormal wavelet basis to conform to the geometrical structure in 
the image. A nonlinear multiscale transform that adapts to discontinuities (and can represent a "clean" 
edge using very few coarse scale coefficients) is proposed in [3]. Each of these new representations has been 
shown to achieve near-optimal asymptotic approximation and R-D performance for piecewise smooth images 
consisting of C H regions separated by discontinuities along C H curves, with H = 2 (H > 2 for bandelets). 
Some have also found use in specialized compression applications such as identification photos [1]. 

In [33], we have presented a scheme that is based on the simple yet powerful observation that geometric 
features can be efficiently approximated using local, geometric atoms in the spatial domain, and that the 
projection of these geometric primitives onto wavelet subspaces can therefore approximate the corresponding 
wavelet coefficients. We prove that the resulting dictionary achieves the optimal nonlinear approximation 
rates for piecewise smooth signal classes. To account for the added complexity of this encoding strategy, 
we also consider R-D results and prove that this scheme comes within a logarithmic factor of the optimal 
performance rate. Unlike the techniques mentioned above, our method also generalizes to arbitrary orders 
of smoothness and arbitrary signal dimension. 
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Dimensionality Reduction 1 



Recent years have seen a proliferation of novel techniques for what can loosely be termed "dimensionality 
reduction." Like the tasks of approximation and compression discussed above, these methods involve some 
aspect in which low-dimensional information is extracted about a signal or collection of signals in some 
high-dimensional ambient space. Unlike the tasks of approximation and compression, however, the goal of 
these methods is not always to maintain a faithful representation of each signal. Instead, the purpose may 
be to preserve some critical relationships among elements of a data set or to discover information about a 
manifold on which the data lives. 

In this section, we review two general methods for dimensionality reduction. Section 7.1 (Manifold learn- 
ing) begins with a brief overview of techniques for manifold learning. Section 7.2 (The Johnson-Lindenstrauss 
lemma) then discusses the Johnson-Lindenstrauss (JL) lemma, which concerns the isometric embedding of 
a cloud points as it is projected to a lower-dimensional space. Though at first glance the JL lemma does not 
pertain to any of the low-dimensional signal models we have previously discussed, we later see in Connections 
with dimensionality reduction (Section 8.6: Connections with dimensionality reduction) that the JL lemma 
plays a critical role in the core theory of CS, and we also employ the JL lemma in developing a theory for 
isometric embeddings of manifolds. 

7.1 Manifold learning 

Several techniques have been proposed for solving a problem known as manifold learning in which certain 
properties of a manifold are inferred from a discrete collection of points sampled from that manifold. A 
typical manifold learning setup is as follows: an algorithm is presented with a set of P points sampled from a 
K-dimensional submanifold of WL N . The goal of the algorithm is to produce an mapping of these P points into 
some lower dimension M M (ideally, M = K) while preserving some characteristic property of the manifold. 
Example algorithms include ISOMAP [98], Hessian Eigenmaps (HLLE) [60], and Maximum Variance Un- 
folding (MVU) [106], which attempt to learn isometric embeddings of the manifold (thus preserving pairwise 
geodesic distances in M M ); Locally Linear Embedding (LLE) [93], which attempts to preserve local linear 
neighborhood structures among the embedded points; Local Tangent Space Alignment (LTSA) [108], which 
attempts to preserve local coordinates in each tangent space; and a method for charting a manifold [12] that 
attempts to preserve local neighborhood structures. 

The internal mechanics of these algorithms differs depending on the objective criterion to be preserved, 
but as an example, the ISOMAP algorithm operates by first estimating the geodesic distance between each 
pair of points on the manifold (by approximating geodesic distance as the sum of Euclidean distances between 
pairs of the available sample points). After the P x P matrix of pairwise geodesic distances is constructed, a 
technique known as multidimensional scaling uses an eigendecomposition of the distance matrix to determine 



lr This content is available online at <http://cnx.Org/content/ml8732/l.5/>. 
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the proper M-dimensional embedding space. An example of using ISOMAP to learn a 2-dimensional manifold 
is shown in Figure 7.1. 
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Figure 7.1: Manifold learning demonstration, (a) As input to the manifold learning algorithm, 1000 
images of size 64 x 64 are created, where each image consists of a white disk translated to a random 
position (81,62). It follows that the images represent a sampling of 1000 points from a 2-dimensional 
submanifold of R 4096 . (b) Scatter plot of the true values for the (61,62) positions. For visibility in each 
plot, the color of each point indicates the true 6\ value, (c) ISOMAP embedding learned from original 
data points in R 4096 . From the low-dimensional embedding coordinates we can infer the relative positions 
of the original high-dimensional images, (d) ISOMAP embedding learned from a random projection of 
the data set to R M , where M - 15. 



These algorithms can be useful for learning the dimension and parametrizations of manifolds, for sorting 
data, for visualization and navigation through the data, and as preprocessing to make further analysis more 
tractable; common demonstrations include analysis of face images and classification of and handwritten 
digits. A related technique, the Whitney Reduction Network [14], [16], seeks a linear mapping to R that 
preserves ambient pairwise distances on the manifold and is particularly useful for processing the output of 
dynamical systems having low-dimensional attractors. 

Other algorithms have been proposed for characterizing manifolds from sampled data without construct- 
ing an explicit embedding in M M . The Geodesic Minimal Spanning Tree (GMST) [42] models the data as 
random samples from the manifold and estimates the corresponding entropy and dimensionality. Another 
technique [88] has been proposed for using random samples of a manifold to estimate its homology (via the 
Betti numbers, which essentially characterize its dimension, number of connected components, etc.). Persis- 
tence Barcodes [32] are a related technique that involves constructing a type of signature for a manifold (or 
simply a shape) that uses tangent complexes to detect and characterize local edges and corners. 

Additional algorithms have been proposed for constructing meaningful functions on the point samples 
To solve a semi-supervised learning problem, a method called Laplacian Eigenmaps [10] has been 



in 



piV 



proposed that involves forming an adjacency graph for the data in R, , computing eigenfunctions of the 
Laplacian operator on the graph (which form a basis for L2 on the graph), and using these functions to 
train a classifier on the data. The resulting classifiers have been used for handwritten digit recognition, 
document classification, and phoneme classification. (The M smoothest eigenfunctions can also be used to 
embed the manifold in M, similar to the approaches described above.) A related method called Diffusion 
Wavelets [41] uses powers of the diffusion operator to model scale on the manifold, then constructs wavelets 
to capture local behavior at each scale. The result is a wavelet transform adapted not to geodesic distance 
but to diffusion distance, which measures (roughly) the number of paths connecting two points. 
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7.2 The Johnson-Lindenstrauss lemma 
7.2,1 Fundamentals 

As with the above techniques in manifold learning, the Johnson-Lindenstrauss (JL) lemma [70], [2], [43], 
[69] provides a method for dimensionality reduction of a set of data in R N . Unlike manifold-based methods, 
however, the JL lemma can be used for any arbitrary set Q of points in K w ; the data set is not assumed to 
have any a priori structure. 

Despite the apparent lack of structure in an arbitrary point cloud data set, the JL lemma suggests that 
there does exist a method for dimensionality reduction of that data set that can preserve key information 
while mapping the data to a lower-dimensional space R M . In particular, the original formulation of the JL 
lemma [70] states that there exists a Lipschitz mapping $ : R w i— ► K M with M = O (log (#Q)) such that all 
pairwise distances between points in Q are approximately preserved. This fact is useful for solving problems 
such as Approximate Nearest Neighbor [69], in which one desires the nearest point in Q to some query 
point y e i w (but a solution not much further than the optimal point is also acceptable). Such problems 
can be solved significantly more quickly in K M than in WL N . 

Recent reformulations of the JL lemma propose random linear operators that, with high probability, will 
ensure a near isometric embedding. These typically build on concentration of measure results such as the 
following. 

Lemma 7.1: 

[2], [43] Let x € HL N , fix < e < 1, and let $ be a matrix constructed in one of the following two 
manners: 

1. <5 is a random M x N matrix with i.i.d. M (0, a 2 ) entries, where a 2 = 1/N, or 

2. <5 is random orthoprojector from R w to R M . 



Then with probability exceeding 



the following holds: 



/ M(e 2 /2-e 3 /3)\ , x 

l-2exp[ K —L- '-L\, (7.1) 



(■'-- :>\/^<S^<(l + £h/^ (7-2) 




The random orthoprojector referred to above is clearly related to the first case (simple matrix multipli- 
cation by a Gaussian $) but subtly different; one could think of constructing a random Gaussian $, then 
using Gram-Schmidt to orthonormalize the rows before multiplying x. We note also that simple rescaling of 

$ can be used to eliminate the J ^ in (7.2); however we prefer this formulation for later reference. 

By using the union bound over all I ^ j pairs of distinct points in Q, Lemma "The Johnson-Lindenstrauss 

lemma" (Lemma 7.2, Johnson-Lindenstrauss, p. 33) can be used to prove a randomized version of the 
Johnson-Lindenstrauss lemma. 

Lemma 7.2: Johnson-Lindenstrauss 
Let Q be a finite collection of points in WL N . Fix < e < 1 and /3 > 0. Set 



■20 

,e 2 /2-e 3 /3 
Let $ be a matrix constructed in one of the following two manners: 



M M -2/9-; 3 /o )M#Q)- (7-3) 



1. <!> is a random M x TV matrix with i.i.d. M (0, a 2 ) entries, where a 2 = l/N, or 
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2. $ is random orthoprojector from K* to R M . 

Then with probability exceeding 1 — (#Q)~ , the following statement holds: for every x,y € Q, 

s [M ||$a;-$u|L x [m , N 

l-£h/^<^ P< 1 + eh/^. 7.4 

V N \\x-y\\ 2 V AT 

Indeed, [2] establishes that both Lemma 7.1, p. 33 and Lemma 7.2, Johnson-Lindenstrauss, p. 33 also 
hold when the elements of $ are chosen i.i.d. from a random Rademacher distribution (±c with equal 
probability 1/2) or from a similar ternary distribution (±\/3a with equal probability 1/6; with probability 
2/3). These can further improve the computational benefits of the JL lemma. 

7.2,2 Connections with compressed sensing 

In the following module on Compressed Sensing we will discuss further topics in dimensionality reduction that 
relate to the JL lemma. In particular, as discussed in Connections with dimensionality reduction (Section 8.6: 
Connections with dimensionality reduction), the core mechanics of Compressed Sensing can be interpreted 
in terms of a stable embedding that arises for the family of K-sparse signals when observed with random 
measurements, and this stable embedding can be proved using the JL lemma. Furthermore, as discussed 
in Stable embeddings of manifolds (Section 8.7: Stable embeddings of manifolds), one can ensure a stable 
embedding of families of signals obeying manifold models under a sufficient number of random projections, 
with the theory again following from the JL lemma. 



Chapter 8 

Compressed Sensing 1 



A new theory known as Compressed Sensing (CS) has recently emerged that can also be categorized as a 
type of dimensionality reduction. Like manifold learning, CS is strongly model-based (relying on sparsity in 
particular). However, unlike many of the standard techniques in dimensionality reduction (such as manifold 
learning or the JL lemma), the goal of CS is to maintain a low-dimensional representation of a signal x from 
which a faithful approximation to x can be recovered. In a sense, this more closely resembles the traditional 
problem of data compression (see Compression (Chapter 6)). In CS, however, the encoder requires no a 
priori knowledge of the signal structure. Only the decoder uses the model (sparsity) to recover the signal. 
We justify such an approach again using geometric arguments. 

8.1 Motivation 

Consider a signal x G M w , and suppose that the basis <£ provides a K-sparse representation of x 

x=^a, (8.1) 

with || a || o = K. (In this section, we focus on exactly K-sparse signals, though many of the key ideas 
translate to compressible signals [28], [54]. In addition, we note that the CS concepts are also extendable to 
tight frames.) 

As we discussed in Compression (Chapter 6), the standard procedure for compressing sparse signals, 
known as transform coding, is to (i) acquire the full iV-sample signal x; (ii) compute the complete set of 
transform coefficients a; (iii) locate the K largest, significant coefficients and discard the (many) small 
coefficients; (iv) encode the values and locations of the largest coefficients. 

This procedure has three inherent inefficiencies: First, for a high-dimensional signal, we must start with 
a large number of samples N . Second, the encoder must compute all N of the transform coefficients a, 
even though it will discard all but K of them. Third, the encoder must encode the locations of the large 
coefficients, which requires increasing the coding rate since the locations change with each signal. 

8.2 Incoherent projections 

This raises a simple question: For a given signal, is it possible to directly estimate the set of large a (n)'s that 
will not be discarded? While this seems improbable, Candes, Romberg, and Tao [23], [28] and Donoho [54] 
have shown that a reduced set of projections can contain enough information to reconstruct sparse signals. 
An offshoot of this work, often referred to as Compressed Sensing (CS) [22], [28], [24], [25], [21], [54], [57], 
has emerged that builds on this principle. 



lr This content is available online at <http://cnx.Org/content/ml8733/l.5/>. 
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In CS, we do not measure or encode the K significant a (n) directly. Rather, we measure and encode 
M < N projections y (in) =< x, 4> m T > of the signal onto a second set of functions {4> m }, m= 1,2,..., M. 
In matrix notation, we measure 

V = *x, (8.2) 

where y is an M x 1 column vector and the measurement basis matrix $ is M x N with each row a basis 
vector <f> m . Since M < N, recovery of the signal x from the measurements y is ill-posed in general; however 
the additional assumption of signal sparsity makes recovery possible and practical. 

The CS theory tells us that when certain conditions hold, namely that the functions {</> m } cannot sparsely 
represent the elements of the basis {ip n } (a condition known as incoherence of the two dictionaries [28], 
[23], [54], [99]) and the number of measurements M is large enough, then it is indeed possible to recover the 
set of large {a (n)} (and thus the signal x) from a similarly sized set of measurements y. This incoherence 
property holds for many pairs of bases, including for example, delta spikes and the sine waves of a Fourier 
basis, or the Fourier basis and wavelets. Significantly, this incoherence also holds with high probability 
between an arbitrary fixed basis and a randomly generated one. 

8.3 Methods for signal recovery 

Although the problem of recovering x from y is ill-posed in general (because x G WL N , y € WL M , and M < N), 
it is indeed possible to recover sparse signals from CS measurements. Given the measurements y = $x, 
there exist an infinite number of candidate signals in the shifted nullspace N ($) + x that could generate 
the same measurements y (see Linear Models from Low-Dimensional Signal Models (Section 4.1: Linear 
models) ) . Recovery of the correct signal x can be accomplished by seeking a sparse solution among these 
candidates. 

8.3.1 Recovery via combinatorial optimization 

Supposing that x is exactly if -sparse in the dictionary <£, then recovery of x from y can be formulated as 
the £q minimization 

a= argmin\\ a || s.t. y = Qtya. (8-3) 

Given some technical conditions on $ and ^ (see Theorem Section 8.3.1 (Recovery via combinatorial 
optimization) below), then with high probability this optimization problem returns the proper if-sparse 
solution a, from which the true x may be constructed. (Thanks to the incoherence between the two bases, 
if the original signal is sparse in the a coefficients, then no other set of sparse signal coefficients a can yield 
the same projections y.) We note that the recovery program (8.3) can be interpreted as finding a K-terva 
approximation to y from the columns of the dictionary $^P, which we call the holographic basis because 
of the complex pattern in which it encodes the sparse signal coefficients [54] . 

In principle, remarkably few incoherent measurements are required to recover a if-sparse signal via £ 
minimization. Clearly, more than K measurements must be taken to avoid ambiguity; the following theorem 
(which is proved in [7]) establishes that K + 1 random measurements will suffice. (Similar results were 
established by Venkataramani and Bresler [103].) 

Theorem 8.1: 

Let \I> be an orthonormal basis for R N , and let 1 < K < N. Then the following statements hold: 

1. Let $ be an M x N measurement matrix with i.i.d. Gaussian entries with M > 2K. Then 
with probability one the following statement holds: all signals x = ^a having expansion 
coefficients a € M. N that satisfy || a || = K can be recovered uniquely from the M-dimensional 
measurement vector y = <&x via the £o optimization (8.3). 

2. Let x = ^a such that || a || = K. Let $ be an M x N measurement matrix with i.i.d. 
Gaussian entries (notably, independent of x) with M > K + 1. Then with probability one the 
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following statement holds: x can be recovered uniquely from the M-dimensional measurement 
vector y = $x via the £o optimization (8.3). 
3. Let $ be an M x N measurement matrix, where M < K. Then, aside from pathological cases 
(specified in the proof), no signal x = ^a with || a || = K can be uniquely recovered from 
the M-dimensional measurement vector y = Qx. 

The second statement of the theorem differs from the first in the following respect: when K < M < 2K, 
there will necessarily exist if-sparse signals x that cannot be uniquely recovered from the M-dimensional 
measurement vector y = $£. However, these signals form a set of measure zero within the set of allif-sparse 
signals and can safely be avoided if $ is randomly generated independently of x. 

Unfortunately, as discussed in Nonlinear Approximation from Approximation (Section 5.2: Nonlinear 
approximation), solving this £q optimization problem is prohibitively complex. Yet another challenge is 
robustness; in the setting of Theorem "Recovery via t optimization" (Section 8.3.1: Recovery via combi- 
natorial optimization), the recovery may be very poorly conditioned. In fact, both of these considerations 
(computational complexity and robustness) can be addressed, but at the expense of slightly more measure- 
ments. 

8.3.2 Recovery via convex optimization 

The practical revelation that supports the new CS theory is that it is not necessary to solve the Iq- 
minimization problem to recover a. In fact, a much easier problem yields an equivalent solution (thanks 
again to the incoherency of the bases) ; we need only solve for the ^-sparsest coefficients a that agree with 
the measurements y [23], [22], [28], [24], [25], [21], [54], [57] 

a= argmin\\ a || 1 s.t. y = Q^a. (8-4) 

As discussed in Nonlinear Approximation from Approximation (Section 5.2: Nonlinear approximation), this 
optimization problem, also known as Basis Pursuit [35], is significantly more approachable and can be 
solved with traditional linear programming techniques whose computational complexities are polynomial in 
N. 

There is no free lunch, however; according to the theory, more than K + 1 measurements are required 
in order to recover sparse signals via Basis Pursuit. Instead, one typically requires M > cK measurements, 
where c > 1 is an oversampling factor. As an example, we quote a result asymptotic in N. For simplicity, 
we assume that the sparsity scales linearly with N; that is, K = SN, where we call S the sparsity rate. 

Theorem 8.2: 

[27], [56], [53] Set K = SN with < S < 1. Then there exists an oversampling factor c{S) = 
O {log (1/5 1 )), c(S) > 1, such that, for a iT-sparse signal x in the basis ^, the following statements 
hold: 

1. The probability of recovering x via Basis Pursuit from (c(S) + e)K random projections, 
e > 0, converges to one as N — > oo. 

2. The probability of recovering x via Basis Pursuit from (c(S) — e) K random projections, 
e > 0, converges to zero as N — » oo. 

In an illuminating series of recent papers, Donoho and Tanner [53], [56], [62] have characterized the 
oversampling factor c(S) precisely (see also "The geometry of Compressed Sensing" (Section 8.5: The 
geometry of Compressed Sensing)). With appropriate oversampling, reconstruction via Basis Pursuit is also 
provably robust to measurement noise and quantization error [23]. 

We often use the abbreviated notation c to describe the oversampling factor required in various settings 
even though c (S) depends on the sparsity K and signal length N. 

A CS recovery example on the Cameraman test image is shown in Figure 8.1. In this case, with M = AK 
we achieve near-perfect recovery of the sparse measured image. 



38 



CHAPTER 8. COMPRESSED SENSING 



50 










jHV L : 


Sjfc » 




100 


^H |t ■ 


^M 


- 


150 




A 




pan 




A 


*■ 


250 


mm r 


- \ 





i 



Figure 8.1: Compressive sensing reconstruction of the nonlinear approximation Cameraman image from 
Figure 5.2(b). Using M = 16384 random measurements of the if -term nonlinear approximation image 
(where K — 4096), we solve an ^-minimization problem to obtain the reconstruction shown above. The 
MSE with respect to the measured image is 0.08, so the reconstruction is virtually perfect. 



8.3.3 Recovery via greedy pursuit 

At the expense of slightly more measurements, iterative greedy algorithms such as Orthogonal Matching 
Pursuit (OMP) [99], Matching Pursuit (MP) [83], and Tree Matching Pursuit (TMP) [64], [76] have also been 
proposed to recover the signal x from the measurements y (see Nonlinear Approximation from Approximation 
(Section 5.2: Nonlinear approximation)). In CS applications, OMP requires c « 2Zn(TV)[99] to succeed with 
high probability. OMP is also guaranteed to converge within M iterations. We note that Tropp and Gilbert 
require the OMP algorithm to succeed in the first K iterations [99]; however, in our simulations, we allow 
the algorithm to run up to the maximum of M possible iterations. The choice of an appropriate practical 
stopping criterion (likely somewhere between K and M iterations) is a subject of current research in the CS 
community. 



8.4 Impact and applications 

CS appears to be promising for a number of applications in signal acquisition and compression. Instead of 
sampling a TV-sparse signal TV times, only cK incoherent measurements suffice, where TV can be orders of 
magnitude less than TV. Therefore, a sensor can transmit far fewer measurements to a receiver, which can 
reconstruct the signal and then process it in any manner. Moreover, the cK measurements need not be ma- 
nipulated in any way before being transmitted, except possibly for some quantization. Finally, independent 
and identically distributed (i.i.d.) Gaussian or Bernoulli/Rademacher (random ±1) vectors provide a useful 
universal basis that is incoherent with all others. Hence, when using a random basis, CS is universal in the 
sense that the sensor can apply the same measurement mechanism no matter what basis the signal is sparse 
in (and thus the coding algorithm is independent of the sparsity-inducing basis) [28], [54], [4]. 

These features of CS make it particularly intriguing for applications in remote sensing environments that 
might involve low-cost battery operated wireless sensors, which have limited computational and communi- 
cation capabilities. Indeed, in many such environments one may be interested in sensing a collection of 
signals using a network of low-cost signals. 



39 

Other possible application areas of CS include imaging [97], medical imaging [23], [79], and RF environ- 
ments (where high-bandwidth signals may contain low-dimensional structures such as radar chirps) [63]. As 
research continues into practical methods for signal recovery (see Section 8.3 (Methods for signal recovery)), 
additional work has focused on developing physical devices for acquiring random projections. Our group has 
developed, for example, a prototype digital CS camera based on a digital micromirror design [97]. Additional 
work suggests that standard components such as filters (with randomized impulse responses) could be useful 
in CS hardware devices [100]. 

8.5 The geometry of Compressed Sensing 

It is important to note that the core theory of CS draws from a number of deep geometric arguments. For 
example, when viewed together, the CS encoding/decoding process can be interpreted as a linear projection 
$ : R^ i— ► ]R M followed by a nonlinear mapping A : R M i— > R N . In a very general sense, one may naturally 
ask for a given class of signals T e WL N (such as the set of K-sparse signals or the set of signals with coefficients 
II a Hi 5= 1), what encoder/decoder pair $,A will ensure the best reconstruction (minimax distortion) of 
all signals in T. This best-case performance is proportional to what is known as the Gluskin n- width [71], 
[67] of T (in our setting n = M), which in turn has a geometric interpretation. Roughly speaking, the 
Gluskin n-width seeks the (N — n)-dimensional slice through T that yields signals of greatest energy. This 
n- width bounds the best-case performance of CS on classes of compressible signals, and one of the hallmarks 
of CS is that, given a sufficient number of measurements this optimal performance is achieved (to within a 
constant) [54], [48]. 

Additionally, one may view the Iq/(,\ equivalence problem geometrically. In particular, given the mea- 
surements y = $x, we have an (N — M)-dimensional hyperplane H y = {x € l w : y = &x'} = Af ($) + x of 
feasible signals that could account for the measurements y. Supposing the original signal x is ii'-sparse, the 
l\ recovery program will recover the correct solution x if and only if || x'Hj > || x ||j for every other signal 
x g 7i y on the hyperplane. This happens only if the hyperplane H y (which passes through x) does not "cut 
into" the ^i-ball of radius || x \\ 1 . This fi-ball is a polytope, on which x belongs to a (K — l)-dimensional 
"face." If $ is a random matrix with i.i.d. Gaussian entries, then the hyperplane 7i y will have random 
orientation. To answer the question of how M must relate to K in order to ensure reliable recovery, it 
helps to observe that a randomly generated hyperplane H will have greater chance to slice into the t\ ball as 
dim (TO) = N — M grows (or as M shrinks) or as the dimension K — 1 of the face on which x lives grows. Such 
geometric arguments have been made precise by Donoho and Tanner [53], [56], [62] and used to establish a 
series of sharp bounds on CS recovery. 

8.6 Connections with dimensionality reduction 

We have also identified [4] a fundamental connection between the CS and the JL lemma. In order to make 
this connection, we considered the Restricted Isometry Property (RIP), which has been identified as a 
key property of the CS projection operator $ to ensure stable signal recovery. We say $ has RIP of order 
K if for every if-sparse signal x, 

, [M ||$x|| 9 n [m , N 

(i - e) v^y^ (l+£) v^- (8 - 5) 

A random M x N matrix with i.i.d. Gaussian entries can be shown to have this property with high 
probability if M = O {Klog (N/K)). 

While the JL lemma concerns pairwise distances within a finite cloud of points, the RIP concerns isometric 
embedding of an infinite number of points (comprising a union of if-dimensional subspaces in WL N ) . However, 
the RIP can in fact be derived by constructing an effective sampling of if-sparse signals in B. N , using the 
JL lemma to ensure isometric embeddings for each of these points, and then arguing that the RIP must hold 
true for all ii'-sparse signals. (See [4] for the full details.) 



40 CHAPTER 8. COMPRESSED SENSING 

8.7 Stable embeddings of manifolds 

Finally, we have also shown that the JL lemma can also lead to extensions of CS to other concise signal models. 
In particular, while conventional CS theory concerns sparse signal models, it is also possible to consider 
manifold-based signal models. Just as random projections can preserve the low- dimensional geometry (the 
union of hyperplanes) that corresponds to a sparse signal family, random projections can also guarantee a 
stable embedding of a low-dimensional signal manifold. We have the following result, which states that an 
RIP-like property holds for families of manifold-modeled signals. 

Theorem 8.3: 

Let M. be a compact if-dimensional Riemannian submanifold of R N having condition number - 
, volume V, and geodesic covering regularity R. Fix < e < landO < p < 1. Let $ be a random 
M x N orthoprojector with 

/ K\og(NVRi 
M=0\ '— | (8.6) 




If M < N, then with probability at least 1 — p the following statement holds: For every pair of 
points xi, X2 € M, 

|$zi -$x 2 || 2 „ ,., , , [M 



{l-e)\-<^ p<(i + e )W (8.7) 

\\xi -x 2 \\ 2 V N 

The proof of this theorem appears in [6] and again involves the JL lemma. Due to the limited complexity 
of a manifold model, it is possible to adequately characterize the geometry using a sufficiently fine sampling 
of points drawn from the manifold and its tangent spaces. In essence, manifolds with higher volume or with 
greater curvature have more complexity and require a more dense covering for application of the JL lemma; 
this leads to an increased number of measurements. The theorem also indicates that the requisite number of 
measurements depends on the geodesic covering regularity of the manifold, a minor technical concept which 
is also discussed in [6]. 

This theorem establishes that, like the class of if-sparse signals, a collection of signals described by a 
K-dimensional manifold M C R w can have a stable embedding in an M-dimensional measurement space. 
Moreover, the requisite number of random measurements M is once again linearly proportional to the in- 
formation level (or number of degrees of freedom) K in the concise model. This has a number of possible 
implications for manifold-based signal processing. Manifold-modeled signals can be recovered from compres- 
sive measurements (using a customized recovery algorithm adapted to the manifold model, in contrast with 
sparsity-based recovery algorithms) [44], [104]; unknown parameters in parametric models can be estimated 
from compressive measurements; multi-class estimation/classification problems can be addressed [44] by con- 
sidering multiple manifold models; and manifold learning algorithms may be efficiently executed by applying 
them simply to the projection of a manifold-modeled data set to a low-dimensional measurement space [17]. 
(As an example, Figure 7.1(d) shows the result of applying the ISOMAP algorithm on a random projection 
of a data set from M 4096 down to K 15 ; the underlying parameterization of the manifold is extracted with little 
sacrifice in accuracy.) In all of this it is not necessary to adapt the sensing protocol to the model; the only 
change from sparsity-based CS would be the methods for processing or decoding the measurements. In the 
future, more sophisticated concise models will likely lead to further improvements in signal understanding 
from compressive measurements. 



Bibliography 



[1] Let it Wave, www.letitwave.fr. 

[2] D. Achlioptas. Database-friendly random projections. In Proc. Symp. Principles of Database Systems, 
2001. 

[3] F. Arandiga, A. Cohen, M. Doblas, R. Donat, and B. Matei. Sparse representations of images by edge 
adapted nonlinear multiscale transforms. In Proc. IEEE Int. Conf. Image Proc. (ICIP), Barcelona, 
Spain, Sept. 2003. 

[4] R. Baraniuk, M. Davenport, R. DeVore, and M. Wakin. The Johnson- lindenstrauss lemma meets 
compressed sensing. 2006. Preprint. 

[5] R. G. Baraniuk and D. L. Jones. Shear madness:new orthogonal bases and frames using chirp functions. 
IEEE Trans. Signal Proc, 41(12):3543-3549, 1993. 

[6] R. G. Baraniuk and M. B. Wakin. Random projections of smooth manifolds. Foundations of Compu- 
tational Mathematics, 2008. To Appear. 

[7] D. Baron, M. B. Wakin, M. F. Duarte, S. Sarvotham, and R. G. Baraniuk. Distributed compressed 
sensing. 2005. Preprint. 

[8] D. M. Bates and D. G. Watts. Nonlinear Regression Analysis and Its Applications. John Wiley and 
Sons, New York, 1988. 

[9] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. 
Neural Computation, 15(6), June 2003. 

[10] M. Belkin and P. Niyogi. Laplacian eigenmaps for dimensionality reduction and data representation. 
Neural Computation, 15(6), June 2003. 

[11] W. M. Boothby. An Introduction to Differentiable Manifolds and Riemannian Geometry. Academic 
Press, revised 2nd edition, 2003. 

[12] M. Brand. Charting a manifold. In Proc. Neural Inform. Processing Systems - NIPS, 2002. 

[13] D. S. Broomhead and M. Kirby. A new approach for dimensionality reduction: Theory and algorithms. 
SIAM J. of Applied Mathematics, 60(6), 2000. 

[14] D. S. Broomhead and M. Kirby. A new approach for dimensionality reduction: Theory and algorithms. 
SIAM J. of Applied Mathematics, 60(6), 2000. 

[15] D. S. Broomhead and M. J. Kirby. The whitney reduction network: A method for computing autoas- 
sociative graphs. Neural Computation, 13:2595-2616, 2001. 

[16] D. S. Broomhead and M. J. Kirby. The whitney reduction network: A method for computing autoas- 
sociative graphs. Neural Computation, 13:2595-2616, 2001. 

41 



42 BIBLIOGRAPHY 

[17] M.B. Wakin C. Hegde and R.G. Baraniuk. Random projections for manifold learning. In In Proc. 
Neural Information Processing Systems (NIPS), December 2007. 

[18] E. Cand [U+FFFD] nd D. L. Donoho. New tight frames of curvelets and optimal representations of 
objects with piecewise singularities. Comm. on Pure and Applied Math., 57:2198211;266, 2004. 

[19] E. Cand [U+FFFD] nd D. L. Donoho. New tight frames of curvelets and optimal representations of 
objects with piecewise singularities. Comm. on Pure and Applied Math., 57:219-266, 2004. 

[20] E. Cand [U+FFFD] nd D. L. Donoho. New tight frames of curvelets and optimal representations of 
objects with piecewise singularities. Comm. on Pure and Applied Math., 57:219-266, 2004. 

[21] E. Cand [U+FFFD] nd J. Romberg. Practical signal recovery from random projections. 2005. Preprint. 

[22] E. Cand [U+FFFD] nd J. Romberg. Quantitative robust uncertainty principles and optimally sparse 
decompositions. Found, of Comp. Math., 2006. To appear. 

[23] E. Cand [U+FFFD] J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction 
from highly incomplete frequency information. IEEE Trans. Inform. Theory, 52(2), February 2006. 

[24] E. Cand [U+FFFD] J. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate 
measurements. Communications on Pure and Applied Mathematics, 2006. To appear. 

[25] E. Cand [U+FFFD] nd T. Tao. Decoding by linear programming. IEEE Trans. Inform. Theory, 51(12), 
December 2005. 

[26] E. Cand [U+FFFD] nd T. Tao. Error correction via linear programming. Found, of Comp. Math., 2005. 
Preprint. 

[27] E. Cand [U+FFFD] nd T. Tao. Error correction via linear programming. Found, of Comp. Math., 2005. 
Preprint. 

[28] E. Cand [U+FFFD] nd T. Tao. Near optimal signal recovery from random projections and universal 
encoding strategies. IEEE Trans. Inform. Theory, 2006. To appear. 

[29] E. J. Cand [U+FFFD] nd D. L. Donoho. Curvelets 8212; a suprisingly effective nonadaptive representation 
for objects with edges. In A. Cohen, C. Rabut, and L. L. Schumaker, editors, Curve and Surface Fitting. 
Vanderbilt University Press, 1999. 

[30] E. J. Cand [U+FFFD] nd D. L. Donoho. Curvelets 8212; a suprisingly effective nonadaptive representation 
for objects with edges. In A. Cohen, C. Rabut, and L. L. Schumaker, editors, Curve and Surface Fitting. 
Vanderbilt University Press, 1999. 

[31] E. J. Cand [U+FFFD] nd D. L. Donoho. Curvelets: A suprisingly effective nonadaptive representation 
for objects with edges. In A. Cohen, C. Rabut, and L. L. Schumaker, editors, Curve and Surface 
Fitting. Vanderbilt University Press, 1999. 

[32] G. Carlsson, A. Zomorodian, A. Collins, and L. Guibas. Persistence barcodes for shapes. Int. J. of 
Shape Modeling. To appear. 

[33] V. Chandrasekaran, M. B. Wakin, D. Baron, and R. Baraniuk. Representation and compression of 
multi-dimensional piecewise functions using surflets. to appear in {\em IEEE Trans. Inf. Theory}, 
2008. 

[34] S. Chen, D. Donoho, and M. Saunders. Atomic decomposition by basis pursuit. SIAM J. on Sci. 
Comp., 20(1):33-61, 1998. 



BIBLIOGRAPHY 43 

[35] 

[36] 



S. Chen, D. Donoho, and M. Saunders. Atomic decomposition by basis pursuit. SIAM J. on Sci. 
Comp., 20(1):33-61, 1998. 



G. F. Clements. Entropies of several sets of real valued functions. Pacific J. Math., 13:1085-1095, 
1963. 

[37] A. Cohen, W. Dahmen, I. Daubechies, and R. DeVore. Tree approximation and optimal encoding. 
Appl. Comput. Harmon. Anal, 11:192-226, 2001. 

[38] A. Cohen, W. Dahmen, I. Daubechies, and R. DeVore. Tree approximation and optimal encoding. 
Appl. Comput. Harmon. Anal, 11:192-226, 2001. 

[39] A. Cohen, I. Daubechies, O. G. Guleryuz, and M. T. Orchard. On the importance of combining wavelet- 
based nonlinear approximation with coding strategies. IEEE Trans. Inform. Theory, 48(7):1895-1921, 
July 2002. 

[40] A. Cohen, I. Daubechies, O. G. Guleryuz, and M. T. Orchard. On the importance of combining wavelet- 
based nonlinear approximation with coding strategies. IEEE Trans. Inform. Theory, 48(7):1895-1921, 
July 2002. 

[41] R. R. Coifman and M. Maggioni. Diffusion wavelets. Appl Comput. Harmon. Anal, 2005. To appear. 

[42] J. A. Costa and A. O. Hero. Geodesic entropic graphs for dimension and entropy estimation in manifold 
learning. IEEE Trans. Signal Processing, 52(8), August 2004. 

[43] S. Dasgupta and A. Gupta. An elementary proof of the johnson-lindenstrauss lemma. Technical report 
TR-99-006, Berkeley, CA, 1999. 

[44] M.A. Davenport, M.F. Duarte, M.B. Wakin, J.N. Laska, D. Takhar, K.F. Kelly, and R.G. Baraniuk. 
The smashed filter for compressive classification and target recognition. In Proc. Computational Imag- 
ing V at SPIE Electronic Imaging, January 2007. 

[45] R. A. DeVore. Nonlinear approximation. Acta Numerica, 7:51-150, 1998. 

[46] R. A. DeVore. Lecture notes on compressed sensing. Rice University ELEC 631 Course Notes, Spring 
2006. 

[47] R. A. DeVore. Lecture notes on compressed sensing. Rice University ELEC 631 Course Notes, Spring 
2006. 

[48] R. A. DeVore. Lecture notes on compressed sensing. Rice University ELEC 631 Course Notes, Spring 
2006. 

[49] R. A. DeVore, B. Jawerth, and B. J. Lucier. Image compression through wavelet transform coding. 
IEEE Trans. Inform. Theory, 38(2):719-746, Mar. 1992. 

[50] M. N. Do and M. Vetterli. Contourlets: A directional multiresolution image representation. In Proc. 
IEEE Int. Conf. Image Proc. (ICIP), Rochester, New York, Oct. 2002. 

[51] M. N. Do and M. Vetterli. The contourlet transform: An efficient directional multiresolution image 
representation. IEEE Trans. Image Processing, 2005. To appear. 

[52] M. N. Do and M. Vetterli. The contourlet transform: An efficient directional multiresolution image 
representation. IEEE Trans. Image Processing, 2005. To appear. 

[53] D. Donoho. High-dimensional centrally symmetric polytopes with neighborliness proportional to di- 
mension. January 2005. Preprint. 



44 BIBLIOGRAPHY 

[54] D. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52(4), April 2006. 

[55] D. Donoho and J. Tanner. Neighborliness of randomly-projected simplices in high dimensions. 2005. 
Preprint. 

[56] D. Donoho and J. Tanner. Neighborliness of randomly-projected simplices in high dimensions. 2005. 
Preprint. 

[57] D. Donoho and Y. Tsaig. Extensions of compressed sensing. 2004. Preprint. 

[58] D. L. Donoho. Unconditional bases are optimal bases for data compression and for statistical estima- 
tion. Appl. Comput. Harmon. Anal., 1(1):100-115, Dec. 1993. 

[59] D. L. Donoho. Denoising by soft-thresholding. IEEE Trans. Inform. Theory, 41(3):613-627, May 1995. 

[60] D. L. Donoho and C. E. Grimes. Hessian eigenmaps: Locally linear embedding techniques for high- 
dimensional data. Proc. Natl. Acad. Sci. USA, 100(10) :5591-5596, May 2003. 

[61] D. L. Donoho and J. Tanner. Counting faces of randomly-projected polytopes when then projection 
radically lowers dimension. Technical report 2006-11, Stanford University Department of Statistics, 
2006. 

[62] D. L. Donoho and J. Tanner. Counting faces of randomly-projected polytopes when then projection 
radically lowers dimension. Technical report 2006-11, Stanford University Department of Statistics, 
2006. 

[63] M. F. Duarte, M. A. Davenport, M. B. Wakin, and R. G. Baraniuk. Sparse signal detection from 
incoherent projections. In Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), May 2006. 

[64] M. F. Duarte, M. B. Wakin, and R. G. Baraniuk. Fast reconstruction of piecewise smooth signals from 
random projections. In Proc. SPARS05, Rennes, France, Nov. 2005. 

[65] F. C. A. Fernandes, R. L. C. van Spaendonck, and C. S. Burrus. A new framework for complex wavelet 
transforms. IEEE Trans. Signal Processing, July 2003. 

[66] F. C. A. Fernandes, M. B. Wakin, and R. G. Baraniuk. Non- redundant, linear-phase, semi-orthogonal, 
directional complex wavelets. In Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), 
Montreal, Quebec, Canada, May 2004. 

[67] A. Garnaev and E. D. Gluskin. The widths of euclidean balls. Doklady An. SSSR., 277:1048-1052, 
1984. 

[68] M. W. Hirsch. Differential Topology, volume 33 of Graduate Texts in Mathematics. Springer, 1976. 

[69] P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimenstion- 
ality. In Proc. Symp. Theory of Computing, pages 604-613, 1998. 

[70] W. B Johnson and J. Lindenstrauss. Extensions of lipschitz mappings into a hilbert space. In Proc. 
Conf. Modern Analysis and Probability, pages 189-206, 1984. 

[71] B. Kashin. The widths of certain finite dimensional sets and classes of smooth functions. Izvestia, 
(41):334-351, 1977. 

[72] N. Kingsbury. Image processing with complex wavelets. Phil. Trans. R. Soc. Lond. A, 357, Sept. 1999. 

[73] N. Kingsbury. Complex wavelets for shift invariant analysis and filtering of signals. Appl. Comp. Harm. 
Anal, 10:234-253, 2001. 



BIBLIOGRAPHY 45 

[74] A. N. Kolmogorov and V. M. Tihomirov. -entropy and -capacity of sets in functional spaces. Amer. 
Math. Soc. Transl. (Ser. 2), 17:277-364, 1961. 



[75 
[76 

[77] 

[78 

[79 

[80 

[81 
[82 
[83 
[84 
[85 

[86 
[87] 



[89 

[90 

[91 

[92 
[93 
[94 
[95 



J. Kova269;evi263; and A. Chebira. Life beyond bases: The advent of frames. 2006. Preprint. 

C. La and M. N. Do. Signal reconstruction using sparse tree representation. In Proc. Wavelets XI at 
SPIE Optics and Photonics, San Diego, August 2005. SPIE. 

E. Le Pennec and S. Mallat. Sparse geometric image representations with bandelets. IEEE Trans. 
Image Processing, 14(4):423-438, April 2005. 

S. LoPresto, K. Ramchandran, and M. T. Orchard. Image coding based on mixture modeling of wavelet 
coefficients and a fast estimation-quantization framework. In Proc. Data Compression Conf., pages 
221-230, Snowbird, Utah, March 1997. 

M. Lustig, D. L. Donoho, and J. M. Pauly. Rapid mr imaging with compressed sensing and randomly 
under-sampled 3dft trajectories. In Proc. 14th Ann. Mtg. ISMRM, May 2006. 

S. Mallat. A wavelet tour of signal processing. Academic Press, San Diego, CA, USA, 1999. 

S. Mallat. A wavelet tour of signal processing. Academic Press, San Diego, CA, USA, 1999. 

S. Mallat. A wavelet tour of signal processing. Academic Press, San Diego, CA, USA, 1999. 

S. Mallat. A wavelet tour of signal processing. Academic Press, San Diego, CA, USA, 1999. 

D. Marr. Vision. W. H. Freeman and Company, San Francisco, 1982. 

N. Mehrseresht and D. Taubman. An efficient content-adaptive motion compensated 3d-dwt with 
enhanced spatial and temporal scalability. 2004. Preprint. 

F. Morgan. Riemannian Geometry: A Beginner's Guide. A K Peters, 2nd edition, 1998. 

P. Niyogi, S. Smale, and S. Weinberger. Finding the homology of submanifolds with confidence from 
random samples. 2004. Preprint. 

P. Niyogi, S. Smale, and S. Weinberger. Finding the homology of submanifolds with confidence from 
random samples. 2004. Preprint. 

B. Olshausen and D. Field. Sparse coding with an overcomplete basis set: A strategy employed by vl? 
Vision Res., 37:311-3325, 1997. 

B. O'Neill. Elementary Differential Geometry. Harcourt Academic Press, 2nd edition, 1997. 

M. T. Orchard and H. Ates. Equiripple design of real and complex filter banks. Technical report, Rice 
University, 2003. 

I. Ur Rahman, I. Drori, V. C. Stodden, D. L. Donoho, and P. Schroeder. Multiscale representations 
for manifold-valued data. 2004. Preprint. 

S. T. Roweis and L. K. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 
290(5500) :2323-2326, December 2000. 

I. W. Selesnick. The design of approximate hilbert transform pairs of wavelet bases. IEEE Trans. 
Signal Processing, 50(5), May 2002. 

I. W. Selesnick and K. L. Li. Video denoising using 2d and 3d dual-tree complex wavelet transforms. 
In Proc. SPIE Wavelet Applications Signal Image Processing X, 2003. 



46 BIBLIOGRAPHY 

[96] J. Shapiro. Embedded image coding using zerotrees of wavelet coefficients. IEEE Trans. Signal Pro- 
cessing, 41(12):3445-3462, Dec. 1993. 

[97] D. Takhar, V. Bansal, M. Wakin, M. Duarte, D. Baron, K. F. Kelly, and R. G. Baraniuk. A compressed 
sensing camera: New theory and an implementation using digital micromirrors. In Proc. Computational 
Imaging IV at SPIE Electronic Imaging, San Jose, January 2006. SPIE. 

[98] J. B. Tenenbaum, V. de Silva, and J. C. Langford. A global geometric framework for nonlinear 
dimensionality reduction. Science, 290(5500) :2319-2323, December 2000. 

[99] J. Tropp and A. C. Gilbert. Signal recovery from partial information via orthogonal matching pursuit. 
April 2005. Preprint. 

[100] J. A. Tropp, M. B. Wakin, M. F. Duarte, D. Baron, and R. G. Baraniuk. Random filters for compressive 
sampling and reconstruction. In Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), May 
2006. 

[101] M. Turk and A. Pentland. Eigenfaces for recognition. J. Cognitive Neuroscience, 3(1), 1991. 

[102] R. van Spaendonck, T. Blu, R. Baraniuk, and M. Vetterli. Orthogonal hilbert transform filter banks 
and wavelets. In Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP), 2003. 

[103] R. Venkataramani and Y. Bresler. Further results on spectrum blind sampling of 2d signals. In Proc. 
IEEE Int. Conf. Image Proc. (ICIP), volume 2, Chicago, Oct. 1998. 

[104] M. B. Wakin. The Geometry of Low-Dimensional Signal Models. Ph. d. thesis, department of electrical 
and computer engineering, Rice University, Houston, Tx, August 2006. 

[105] M. B. Wakin, D. L. Donoho, H. Choi, and R. G. Baraniuk. High-resolution navigation on non- 
different iable image manifolds. In Proc. Int. Conf. Acoustics, Speech, Signal Processing (ICASSP). 
IEEE, 2005. 

[106] K. Q. Weinberger and L. K. Saul. Unsupervised learning of image manifolds by semidefinite program- 
ming. Int. J. Computer Vision 8211; Special Issue: Computer Vision and Pattern Recognition- CVPR 
2004, 70(l):77-90, 2006. 

[107] Z. Xiong, K. Ramchandran, and M. T. Orchard. Space-frequency quantization for wavelet image 
coding. IEEE Trans. Image Processing, 6(5):677-693, 1997. 

[108] Z. Zhang and H. Zha. Principal manifolds and nonlinear dimension reduction via tangent space 
alignment. SIAM J. Scientific Comput., 26(1), 2004. 



ATTRIBUTIONS 47 

Attributions 

Collection: Concise Signal Models 

Edited by: Michael Wakin 

URL: http://cnx.org/content/coll0635/l-4/ 

License: http://creativecommons.Org/licenses/by/2.0/ 

Module: "Introduction to Concise Signal Models" 

By: Michael Wakin 

URL: http://cnx.org/content/ml8720/L5/ 

Pages: 1-2 

Copyright: Michael Wakin 

License: http://creativecommons.Org/licenses/by/2.0/ 

Module: "Signal Dictionaries and Representations" 

By: Michael Wakin 

URL: http://cnx.org/content/ml8724/L5/ 

Pages: 3-9 

Copyright: Michael Wakin 

License: http://creativecommons.Org/licenses/by/2.0/ 

Module: "Manifolds" 

By: Michael Wakin 

URL: http://cnx.org/content/ml8722/l-4/ 

Pages: 11-13 

Copyright: Michael Wakin 

License: http://creativecommons.org/licenses/by/2-0/ 

Module: "Low-Dimensional Signal Models" 

By: Michael Wakin 

URL: http://cnx.org/content/ml8726/l-4/ 

Pages: 15-19 

Copyright: Michael Wakin 

License: http://creativecommons.org/licenses/by/2-0/ 

Module: "Approximation" 

By: Michael Wakin 

URL: http://cnx.org/content/ml8727/l-5/ 

Pages: 21-27 

Copyright: Michael Wakin 

License: http://creativecommons.org/licenses/by/2-0/ 

Module: "Compression" 

By: Michael Wakin 

URL: http://cnx.org/content/ml8729/l-3/ 

Pages: 29-30 

Copyright: Michael Wakin 

License: http://creativecommons.org/licenses/by/2-0/ 



48 ATTRIBUTIONS 

Module: "Dimensionality Reduction" 

By: Michael Wakin 

URL: http://cnx.org/content/ml8732/l-5/ 

Pages: 31-34 

Copyright: Michael Wakin 

License: http://creativecommons.Org/licenses/by/2.0/ 

Module: "Compressed Sensing" 

By: Michael Wakin 

URL: http://cnx.org/content/ml8733/l-5/ 

Pages: 35-40 

Copyright: Michael Wakin 

License: http://creativecommons.Org/licenses/by/2.0/ 



Concise Signal Models 

This collection reviews fundamental concepts underlying the use of concise models for signal processing. 
Topics are presented from a geometric perspective and include low-dimensional linear, sparse, and manifold- 
based signal models, approximation, compression, dimensionality reduction, and Compressed Sensing. 



About Connexions 

Since 1999, Connexions has been pioneering a global system where anyone can create course materials and 
make them fully accessible and easily reusable free of charge. We are a Web-based authoring, teaching and 
learning environment open to anyone interested in education, including students, teachers, professors and 
lifelong learners. We connect ideas and facilitate educational communities. 

Connexions's modular, interactive courses are in use worldwide by universities, community colleges, K-12 
schools, distance learners, and lifelong learners. Connexions materials are in many languages, including 
English, Spanish, Chinese, Japanese, Italian, Vietnamese, French, Portuguese, and Thai. Connexions is part 
of an exciting new information distribution system that allows for Print on Demand Books. Connexions 
has partnered with innovative on-demand publisher QOOP to accelerate the delivery of printed course 
materials and textbooks into classrooms worldwide at lower prices than traditional academic publishers. 



